Ειδικά Μαθήματα Βοτανικής

## Warning: package 'rmdformats' was built under R version 4.0.5

All files required for this lab are freely available online.

1 Before we start

Before you start downloading data from the internet, you need to create a project in R-Studio. The name of the project, as well as its file path should NOT include the following:
1. Greek letters
2. spaces (replace the spaces with _)

For example your file path could be: C:/Environmental_Data_Analysis/Geographical_Data.

You also need to install and load all the packages that we are going to use today1.

2 Geographical data

2.1 Introduction

In general, geographical data can be divided in two main categories:

1. Vector data
2. Raster data

2.1.1 Vector data

Vector data can be further subdivided into three categories:

1. Points
2. Lines
3. Polygons

These data have discrete, well-defined borders and usually have a high level of precision, but are not necessarily accurate.

Points located within a coordinate reference system (CRS) constitude the basis of vector data. Points can either be self-standing features (e.g., Ioannina city) or they can be linked together to form more complex geometries, such as lines (e.g., the national road connecting Ioannina to Thessaloniki) and polygons (e.g., the area covered by the city of Ioannina)2.

The main packages we are going to use when dealing with vector data are the following:

1. sf
2. sp
3. rgeos
4. rgdal

You can see the sf vignettes here and the sp vignettes here.

A great resource for spatial data analysis is this book by Bivand et al. (2008), as well as this one3.

2.1.2 Raster data

On the other hand, raster data divide the surface up into cells of constant size (e.g., 1 x 1 km2 cells). Raster datasets are the basis of background images used in web-mapping and have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices. Rasters aggregate spatially specific features to a given resolution, meaning that they are consistent over space and scalable (Lovelace et al., 2019).

he geographic raster data model usually consists of a raster header and a matrix (with rows and columns) representing equally spaced cells (i.e., pixels). The raster header defines the coordinate reference system, the extent and the origin. The header defines the extent via the number of columns, the number of rows and the cell size resolution. Hence, starting from the origin, we can easily access and modify each single cell by either using the ID of a cell or by explicitly specifying the rows and columns. In contrast to vector data, the cell of one raster layer can only hold a single value. The value might be numeric or categorical.

Raster maps usually represent continuous phenomena, such as elevation, temperature, population density or spectral data. Discrete features such as soil or land-cover classes can also be represented as raster data. Consequently, the discrete borders of these features become blurred, and depending on the spatial task a vector representation might be more suitable.

In general, there are three raster classes:

1. RasterLayer
2. RasterBrick
3. RasterStack

The RasterLayer class represents the simplest form of a raster object, and consists of only one layer.

Both RasterBrick and RasterStack classes can handle multiple layers, but differ regarding the number of supported file formats, type of internal representation and processing speed.

A RasterBrick consists of multiple layers, which typically correspond to a single multispectral satellite file or a single multilayer object in memory.

A RasterStack is similar to a RasterBrick in the sense that it consists also of multiple layers. However, in contrast to RasterBrick, RasterStack allows you to connect several raster objects stored in different files or multiple objects in memory. More specifically, a RasterStack is a list of RasterLayer objects with the same extent and resolution (Lovelace et al., 2019).

3 Download vector data

3.1 Install and load packages

We will need to install several libraries first.

## ===========================================================================##
## Install the main packages
## ===========================================================================##
install.packages(c("tidyverse", "rgbif", "raster", "data.table", "rgeos", "rgdal", 
    "wdpar", "countrycode", "devtools", "sf", "usdm", "speciesgeocodeR", "pacman", 
    "dismo", "CoordinateCleaner", "ggspatial", "scales", "scrubr", "mapr", "rasterVis"), 
    dependencies = T)

devtools::install_github("rinat", "ropensci")
## ===========================================================================##

And then we will have to load those libraries:

## ===========================================================================##
## Load them
## ===========================================================================##
pacman::p_load(tidyverse, rgbif, raster, usdm, rgeos, rgdal, rinat, wdpar, countrycode, 
    devtools, sf, speciesgeocodeR, pacman, dismo, CoordinateCleaner, ggspatial, scales, 
    scrubr, mapr, rasterVis, data.table)
## package 'rinat' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\User\AppData\Local\Temp\Rtmp0IIwkq\downloaded_packages
## package 'binman' successfully unpacked and MD5 sums checked
## package 'semver' successfully unpacked and MD5 sums checked
## package 'wdman' successfully unpacked and MD5 sums checked
## package 'RSelenium' successfully unpacked and MD5 sums checked
## package 'wdpar' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\User\AppData\Local\Temp\Rtmp0IIwkq\downloaded_packages
## package 'sparsesvd' successfully unpacked and MD5 sums checked
## package 'docopt' successfully unpacked and MD5 sums checked
## package 'qlcMatrix' successfully unpacked and MD5 sums checked
## package 'hoardr' successfully unpacked and MD5 sums checked
## package 'scrubr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\User\AppData\Local\Temp\Rtmp0IIwkq\downloaded_packages
## package 'gistr' successfully unpacked and MD5 sums checked
## package 'mapr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\User\AppData\Local\Temp\Rtmp0IIwkq\downloaded_packages
## ===========================================================================##

3.2 Polygon data

3.2.1 Country boundaries

Then we can download some vector data in the form of polygons. We can use the online database GADM to download country boundaries.

## ===========================================================================##
## Download the data for Greece
## ===========================================================================##
## There are three levels available, you are free to explore them
Greece <- getData("GADM", country = "GRC", level = 3)
## ===========================================================================##

We can easily plot the data we have just downloaded.

## ===========================================================================##
## Plot Greece
## ===========================================================================##
plot(Greece)

It is equally easy to subset those data.

## ===========================================================================##
## First we need to see what is inside the object we created
## ===========================================================================##
Greece
## class       : SpatialPolygonsDataFrame 
## features    : 326 
## extent      : 19.37236, 29.6457, 34.80069, 41.74801  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 16
## names       : GID_0, NAME_0,   GID_1,                      NAME_1,                                                                                        NL_NAME_1,     GID_2,         NAME_2,                        NL_NAME_2,       GID_3,   NAME_3,          VARNAME_3,                                                     NL_NAME_3, TYPE_3,    ENGTYPE_3, CC_3, ... 
## min values  :   GRC, Greece, GRC.1_1,                      Aegean,                                                                 <U+0386><U+03B8><U+03C9><U+03C2>, GRC.1.1_1,          Athos, <U+0386><U+03B8><U+03C9><U+03C2>, GRC.1.1.1_1,   Abdera, Abelokipi-Menemeni,                              <U+0386><U+03B8><U+03C9><U+03C2>,  Dímos, Municipality,   NA, ... 
## max values  :   GRC, Greece, GRC.8_1, Thessaly and Central Greece, Tessa<U+03BB><U+03AF>a <U+03BA>a<U+03B9> Ste<U+03C1>e<U+03AC> <U+0395><U+03BB><U+03BB><U+03AC>da, GRC.8.2_1, West Macedonia,           Tessa<U+03BB><U+03AF>a, GRC.8.2.9_1, Zografou,           Zografos, Tessa<U+03BB><U+03BF><U+03BD><U+03AF><U+03BA><U+03B7><U+03C2>,  Dímos, Municipality,   NA, ...

We see that our object has a Coordinate Reference System (crs) and that it includes several columns (as indicated by the slot names).

Next, we are going to see how many columns our object includes:

names(Greece)
##  [1] "GID_0"     "NAME_0"    "GID_1"     "NAME_1"    "NL_NAME_1" "GID_2"    
##  [7] "NAME_2"    "NL_NAME_2" "GID_3"     "NAME_3"    "VARNAME_3" "NL_NAME_3"
## [13] "TYPE_3"    "ENGTYPE_3" "CC_3"      "HASC_3"

Finally, we are going to create a new spatial object by subsetting Greece. We do so, by using one of the columns present in Greece.

Crete <- subset(Greece, NAME_1 == "Crete")

Let’s plot Crete now

## ===========================================================================##
## Plot Crete
## ===========================================================================##
plot(Crete)

3.2.2 Protected areas

We can also download data from the World Database on Protected areas, the most comprehensive global dataset of protected areas. It is used to monitor the performance of existing protected areas and pinpoint priority areas for establishing new protected areas. We will use the wdpar package to download data for Greece.

## ===========================================================================##
## Download protected area data for Greece (it might take a while)
## ===========================================================================##
gr_raw_pa_data <- wdpa_fetch("Greece", wait = TRUE)
## ===========================================================================##

## ===========================================================================##
## Let's see what we have downloaded
## ===========================================================================##
## head(gr_raw_pa_data)

If you need more information on what these columns mean, please refer to the WPDA manual.

We need to change the CRS for plotting. Currently the CRS of gr_raw_pa_data is set to Greek Grid (else known as EGSA 87). We need to change it to WGS 84 (also known as World Geodetic System). We will use the function st_transform from the sf package. The number 4326 refers to WGS 84 in the European Data Petroleum System or EPSG.

## ===========================================================================##
## Reproject data to longitude/latitude for plotting
## ===========================================================================##
gr_pa_data <- st_transform(gr_raw_pa_data, 4326)

As a next step, we will clip the terrestrial protected areas to the Greek coastline. First, in order to be on the safe side, we will perform some sanity checks on Greek boundaries and then we will do the clipping.

##===========================================================================##
## Sanity checks
##===========================================================================##
gr_boundary_data <- Greece %>% st_as_sf() %>% 
  st_set_precision(1000) %>%
  lwgeom::st_make_valid() %>%
  st_set_precision(1000) %>%
  st_combine() %>%
  st_union() %>%
  st_set_precision(1000) %>%
  lwgeom::st_make_valid() %>%
  st_transform(st_crs(gr_pa_data)) %>%
  lwgeom::st_make_valid()
##===========================================================================##


##===========================================================================##
## clip Greece's protected areas to the coastline
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
  filter(MARINE == "terrestrial") %>%
  st_intersection(gr_boundary_data) %>%
  rbind(gr_pa_data %>%
          filter(MARINE == "marine") %>%
          st_difference(gr_boundary_data)) %>%
  rbind(gr_pa_data %>% filter(!MARINE %in% c("terrestrial",
                                              "marine"))) 
##===========================================================================##

Now, we can recalculate the extent of each protected area.

## ===========================================================================##
## Recalculate extent
## ===========================================================================##
gr_pa_data <- gr_pa_data %>% mutate(AREA_KM2 = as.numeric(st_area(.)) * 1e-06)  ## What is this number?
## ===========================================================================##

As a next step, we can also calculate the percentage of land inside its protected area system that are managed under different categories, as defined by IUCN).

##===========================================================================##
## Calculate the percentage of land inside the PA system for each IUCN category
##===========================================================================##
statistic <- gr_pa_data %>%
  as.data.frame() %>%
  dplyr::select(-geometry) %>%
  group_by(IUCN_CAT) %>%
  summarize(area_km = sum(AREA_KM2)) %>%
  ungroup() %>%
  mutate(percentage = (area_km / sum(area_km)) * 100) %>%
  arrange(desc(area_km))
##===========================================================================##


##===========================================================================##
## Print it
##===========================================================================##
statistic

Now we can keep ony those PAs that have been assigned to each of the IUCN categories (I-VI) and discard all other PAs.

## ===========================================================================##
## Keep certain PAs only
## ===========================================================================##
gr_pa_data_filtered <- gr_pa_data %>% filter(!str_detect(IUCN_CAT, "Not"))
## ===========================================================================##

Let’s visualise the results.

## ===========================================================================##
## First convert Greece and Crete to sf objects
## ===========================================================================##
Greece_sf <- st_as_sf(Greece)
Crete_sf <- st_as_sf(Crete)
## ===========================================================================##


## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Greece_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + geom_sf(data = Greece_sf, 
    fill = NA) + geom_sf(aes(fill = IUCN_CAT), data = gr_pa_data_filtered, inherit.aes = FALSE) + 
    theme(axis.title = element_blank(), legend.position = "bottom")

Now we can restrict the analysis to Crete. We will clip the PA data to the extent of Crete. In order to do so, we will use the st_intersection function from the sf package and then we will visualise the results.

## ===========================================================================##
## Restrict the analysis to Crete
## ===========================================================================##
pa_crete <- st_intersection(gr_pa_data_filtered, Crete_sf)
## ===========================================================================##


## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Crete_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + geom_sf(data = Crete_sf, 
    fill = NA) + geom_sf(aes(fill = IUCN_CAT), data = pa_crete, inherit.aes = FALSE) + 
    theme(axis.title = element_blank(), legend.position = "bottom")

Finally, if we want to see all the PAs in Crete, we will do the following:

## ===========================================================================##
## Restrict the analysis to Crete
## ===========================================================================##
pa_crete_all <- st_intersection(gr_pa_data, Crete_sf)
## ===========================================================================##


## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Crete_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + geom_sf(data = Crete_sf, 
    fill = NA) + geom_sf(aes(fill = IUCN_CAT), data = pa_crete_all, inherit.aes = FALSE) + 
    theme(axis.title = element_blank(), legend.position = "bottom")

3.3 Point data

As we have mentioned earlier, the most basic category of vector data are points. In ecology, points most often represent sampling locations or field observation in general. These observations consist primarily species presence in one site. Nowadays, it is fairly easy to obtain species occurrence data through freely available, online databases, such as GBIF, BIEN and iNaturalist. Bear in mind these databases refer to current/recent species occurrences, yet there are others, such as the Paleobiology and Neotoma databases, that contain information regarding fossil data.

Today we are going to explore the GBIF and the iNaturalist databases, but you are free to explore the other ones as well. The former contains more than 1.3 billion records, which can be easily accessed for ecological analyses, while the latter holds significantly less records and is essentially a citizen-science database. The ease of access and use is slightly counterbalanced by the fact that those databases contain several erroneous occurrences (e.g., misidentifications in the case of iNaturalist, coordinates falling into the sea in the case of GBIF - see Maldonado et al.(2015) for more).

Let’s first download some point data for Petromarula pinnata, one of the six endemic genera of Greece.

Fig 1. The endemic species Petromarula pinnata from Crete.

3.3.1 iNaturalist

First we will download data from the iNaturalist database. In order to merge those data with the one we will download from GBIF, we will have to add a column named countryCode and remove any rows with no geographical information4.

## ===========================================================================##
## Download data and create an additional column
## ===========================================================================##
petromarula_inat <- get_inat_obs(taxon_name = "Petromarula pinnata") %>% mutate(countryCode = "GR")
## ===========================================================================##


## ===========================================================================##
## Remove any rows without coordinates
## ===========================================================================##
petromarula_inat <- petromarula_inat[complete.cases(petromarula_inat[, 5:6]), ]

## Why do we use the fifth and sixth column to remove any empty rows?
## ===========================================================================##

Next, we will have to perform some checks in order to remove any erroneous or suspicious coordinates.

## ===========================================================================##
## Clean the coordinates
## ===========================================================================##
flags_rinat <- clean_coordinates(x = petromarula_inat, lon = "longitude", lat = "latitude", 
    countries = "countryCode", species = "scientific_name", tests = c("capitals", 
        "centroids", "equal", "gbif", "zeros", "seas"), seas_ref = Crete)
## ===========================================================================##


## ===========================================================================##
## Keep only those that are OK
## ===========================================================================##
flags_rinats <- flags_rinat %>% mutate(Data = ifelse(.summary == "TRUE", "Clean", 
    "Flagged"), NAME_1 = "Crete", Dataset = "iNaturalist") %>% dplyr::filter(str_detect(Data, 
    "Cle")) %>% dplyr::select(scientific_name, longitude, latitude, Dataset)
## ===========================================================================##

You could run the above functions for a species that might interest you.

3.3.2 GBIF

Let’s now download data from GBIF for the same species. The first step is to check that the species is indeed referenced in GBIF. The function name_suggest() from the rgbif package searches all species that more or less fit the request. Only the species names from GBIF that exactly match the name of our species, are kept.

spp_petromarula <- name_suggest(q = "Petromarula pinnata", rank = "species", limit = 10000)$data$key[1]

The previous step identifies the taxonomic reference number of our species. Using this reference is the safest way to request species occurrences in GBIF.

Now let’s download all the occurrences that are present in GBIF and have geographical information (We will download max. 1000 occurrences)

## ===========================================================================##
## Download data
## ===========================================================================##

sp_occ <- occ_search(taxonKey = spp_petromarula, country = "GR", hasCoordinate = T, 
    limit = 1000, return = "data")$data %>% as_tibble()  ## Convert to tibble for pretty formatting
## ===========================================================================##


## ===========================================================================##
## To prevent any problem with the pathway it is a good practice to remove blank
## spaces from species names
## ===========================================================================##
sp_occ$name <- sub(" ", "_", sp_occ$name)
## ===========================================================================##


## ===========================================================================##
## Total number of occurrences
## ===========================================================================##
sort(table(sp_occ$name), decreasing = T)
## 
## Petromarula_pinnata (L.) A.DC.           Phyteuma_pinnatum L. 
##                            122                              1

It’s not unusual GBIF to contain (at least some) data from iNaturalist, so we will remove any occurrences that come from the latter database.

## ===========================================================================##
## Remove iNaturalist occurrences
## ===========================================================================##
sp_occ_rem <- sp_occ %>% filter(str_detect(references, "inatu"))
## ===========================================================================##


## ===========================================================================##
## How many occurrences in GBIF are also in iNaturalist?
## ===========================================================================##
sp_occ %>% filter(str_detect(references, "inatu")) %>% NROW()
## [1] 31

Next, we will have to perform again some checks in order to remove any erroneous or suspicious coordinates.

## ===========================================================================##
## Clean the coordinates
## ===========================================================================##
flags_gbif <- clean_coordinates(x = sp_occ_rem, lon = "decimalLongitude", lat = "decimalLatitude", 
    countries = "countryCode", species = "species", tests = c("capitals", "centroids", 
        "equal", "gbif", "zeros", "seas"), seas_ref = Crete)
## ===========================================================================##


## ===========================================================================##
## Keep only those that are OK
## ===========================================================================##
flags_gbif <- flags_gbif %>% mutate(Data = ifelse(.summary == "TRUE", "Clean", "Flagged"), 
    NAME_1 = "Crete", Dataset = "GBIF", longitude = decimalLongitude, latitude = decimalLatitude, 
    scientific_name = "Petromarula pinnata") %>% dplyr::filter(str_detect(Data, "Cle")) %>% 
    dplyr::select(scientific_name, longitude, latitude, Dataset)
## ===========================================================================##

You could run the above functions for a species that might interest you.

3.3.3 Join iNaturalist and GBIF data

We are now in place to join both datasets and convert them into a spatial object.

flags_both <- full_join(flags_gbif, flags_rinats) %>% st_as_sf(., coords = c("longitude", 
    "latitude"), crs = 4326)

Let’s visualise the result.

## ===========================================================================##
## First convert the Dataset column to a factor
## ===========================================================================##
flags_both$Dataset <- factor(flags_both$Dataset)
## ===========================================================================##


## ===========================================================================##
## Then add the centroids of each Cretan county
## ===========================================================================##
Crete_sf <- cbind(Crete_sf, st_coordinates(st_centroid(Crete_sf)))
## ===========================================================================##


## ===========================================================================##
## Then do the plotting
## ===========================================================================##
ggplot(Crete_sf) + geom_sf(fill = "antiquewhite1") + geom_sf() + # geom_label(data = Crete_sf, aes(X, Y, label = NAME_3), size = 2, fontface =
# 'bold') +
geom_sf(data = Crete_sf, fill = NA) + geom_sf(data = flags_both, size = 2, aes(color = Dataset, 
    fill = Dataset, shape = Dataset)) + annotation_scale(location = "bl", width_hint = 0.4) + 
    annotation_north_arrow(location = "bl", which_north = "true", pad_x = unit(0.75, 
        "in"), pad_y = unit(0.5, "in"), style = north_arrow_fancy_orienteering) + 
    coord_sf(xlim = c(23.4, 26.4), ylim = c(34.7, 35.7), expand = FALSE) + xlab("Longitude") + 
    ylab("Latitude") + ggtitle("Occurrences available online", subtitle = expression(italic("Petromarula pinnata"))) + 
    theme(panel.grid.major = element_line(color = gray(0.5), linetype = "dashed", 
        size = 0.5), panel.background = element_rect(fill = "aliceblue"))

3.3.4 Download GBIF and iNaturalist data for a specified polygon

We can also download species occurrences data from both databases by providing a polygon that we are interested in. Let’s see how that works for iNaturalist and GBIF. We shall do it first for the GBIF database.

## ============================================================================##
## First, create a vector of the species you are interested in
## ============================================================================##
splist <- c("Centaurea idaea", "Petromarula pinnata", "Dianthus juniperinus")
## ============================================================================##


## ============================================================================##
## Download data for them
## ============================================================================##
keys <- sapply(splist, function(x) name_suggest(x)$data$key[1], USE.NAMES = FALSE)
## ============================================================================##


## ============================================================================##
## Create an WKT file for the download
## ============================================================================##
ex <- raster::extent(Crete)
ex <- as(ex, "SpatialPolygons")
ex <- rgeos::writeWKT(ex)

## It seems that there is a problem with the number of digits, so we trimmed them
ex2 <- "POLYGON((23.47347 34.80068, 23.47347 35.69597, 26.35347 35.6959, 26.35347 34.80068, 23.47347 34.80068))"
## ============================================================================##


## ============================================================================##
## Download the data for these species from GBIF ----
## ============================================================================##
dat_ne <- occ_search(taxonKey = keys, hasCoordinate = T, limit = 1000, geometry = ex2)
## ============================================================================##


## ============================================================================##
## Convert it to a dataframe ----
## ============================================================================##
for (i in seq_along(dat_ne)) {
    dat_ne[[i]] <- dat_ne[[i]]$data
}

dat_ne <- lapply(dat_ne, "as.data.frame")

dat_ne <- bind_rows(dat_ne)

Try running the above functions for some species that might be present in Epirus (e.g., Ophrys helenae).

And then, in a similar manner for the iNaturalist database.

## ============================================================================##
## First, create a vector of the species you are interested in
## ============================================================================##
gen_list <- c("Centaurea idaea", "Petromarula pinnata")
## ============================================================================##


## ============================================================================##
## Create the desired extent
## ============================================================================##
## We are interested in Crete
extent(Crete)
## class      : Extent 
## xmin       : 23.47347 
## xmax       : 26.35347 
## ymin       : 34.80069 
## ymax       : 35.69597
bounds <- c(34.80069, 23.47347, 35.69597, 26.35347)
## ============================================================================##


## ============================================================================##
## Create an empty list to store them and run a for loop
## ============================================================================##
plants <- list()
for (i in seq_along(gen_list)) {
    plants[[i]] <- get_inat_obs(taxon_name = gen_list[[i]], bounds = bounds)
}
## ============================================================================##


## ============================================================================##
## Convert the list to a dataframe
## ============================================================================##
rinat_df <- do.call(rbind.data.frame, plants)
## ============================================================================##

3.3.5 Download GBIF data for a specific country

Finally, we can download all the plant species that occur in a country (or a specified polygon as exemplified earlier) and are present in GBIF.

##============================================================================##
## Use the name_suggest function to get the gbif taxon key
##============================================================================##
tax_key <- name_suggest(q = "Magnoliopsida", rank = "Class")
##============================================================================##


##============================================================================##
## Sometimes groups have multiple taxon keys, so we will
## check how many records are available for them
##============================================================================##
lapply(tax_key$data$key, "occ_count")
## [[1]]
## [1] 239454224
##============================================================================##
## The first one is relevant
##============================================================================##
tax_key <- tax_key$data$key
##============================================================================##


##============================================================================##
## How many plant taxa occur in Greece and are present in GBIF?
##============================================================================##
occ_count(tax_key, country = "GRC")
## [1] 0
##============================================================================##
## Let's download some plant species for Crete
##============================================================================##
data_gr <- occ_search(taxonKey = tax_key, 
                     return = "data", 
                     hasCoordinate = T, ## We only want data with coordinates
                     geometry = ex2, ## Here we define our study area polygon
                     # country = 'GR', ## If you want data for a specific country
                     limit = 3000)$data %>% ## We only want 3000 occurrences
  as_tibble() ## Convert to tibble for pretty formatting

3.4 Subset point data based on polygons

We can now see if there are any points occurring in a specific polygon (e.g., in Sfakia county). We will use spatial subsetting for this. Spatial subsetting is the process of selecting features of a spatial object based on whether or not they in some way relate in space to another object.

## ============================================================================##
## First convert the occurrence data from GBIF to a spatial object
## ============================================================================##
data_gr_sf <- st_as_sf(data_gr, coords = c("decimalLongitude", "decimalLatitude"), 
    crs = 4326)
## ============================================================================##


## ============================================================================##
## Then create an object representing Sfakia and another one representing Rethymno
## ============================================================================##
Sfakia <- subset(Crete, NAME_3 == "Sfakia") %>% st_as_sf()
Rethymno <- subset(Crete, NAME_3 == "Rethymno") %>% st_as_sf()
## ============================================================================##

## ============================================================================##
## Then use it for spatial subsetting to return all the occurrences in Sfakia and
## in Rethymno
## ============================================================================##
occs_Sfakia <- data_gr_sf[Sfakia, ]
occs_Rethymno <- data_gr_sf[Rethymno, ]
## ============================================================================##


## ============================================================================##
## Use it again to obtain the occurrences NOT in Sfakia
## ============================================================================##
occs_not_in_Sfakia <- data_gr_sf[Sfakia, , op = st_disjoint]
## ============================================================================##

Note the empty argument — denoted with , , — in the preceding code chunk is included to highlight op, the third argument in [ for sf objects. By using st_disjoint, we selected all the occurrences that are not present in Sfakia county.

Let’s see what we have created.

## ============================================================================##
## First we plot Crete
## ============================================================================##
plot(Crete)
## ============================================================================##


## ============================================================================##
## Then we plot Sfakia, after we convert the polygon into a polyline with st_cast
## ============================================================================##
plot(st_geometry(st_cast(Sfakia, "MULTILINESTRING")), add = T, col = "red")
## ============================================================================##


## ============================================================================##
## Then we plot the occurrences present in Sfakia
## ============================================================================##
plot(st_geometry(occs_Sfakia), add = T, col = "blue")
## ============================================================================##


## ============================================================================##
## And finally we plot the occurrences not present in Sfakia
## ============================================================================##
plot(st_geometry(occs_not_in_Sfakia), add = T, col = "darkgreen")

We can also calculate how many occurrences and how many species each Cretan county contains.

##============================================================================##
## First we will create a new spatial object containing both the point and the
## polygon data
##============================================================================##
aggr_data = Crete_sf %>%
  st_join(data_gr_sf) %>%
  group_by(NAME_3) %>% ## Here we group the data based on the county names
  
  ## First we create a variable containing the sum of the species in each county
  summarize(Species_Number = sum(NROW(unique(scientificName))),
            
            ## Then we just sum the number of occurrences in each county
            Occurrence_Number = n())
##============================================================================##


##============================================================================##
## Let's first plot the occurence number in each Cretan county
##============================================================================##
plot(aggr_data['Occurrence_Number'])

##============================================================================##
## And then plot the species number in each Cretan county
##============================================================================##
plot(aggr_data['Species_Number'])

4 Download raster data

Nowadays, there are several freely available raster databases for climate (e.g., WorldClim, CHELSA, Oscillayers, PaleoClim, CliMond, MerraClim, TerraClimate, EuMedClim, Envirem), soil (SoilGrids), topographical (EarthEnv, CGIAR, MERIT), geological (e.g., GLIM) and LULC (Land Cover/Land Use - Luh) data. Of course there are many other databases dealing with NVDI data, habitat heterogeneity, human population density etc.

Today we will deal with climate data from the WorldClim database, as these data can be accessed rather easily through the R interface. WorldClim provides data for 19 environmental variables that have to with temperature and precipitation:

WorldClim environmental variables

Var. Description
BIO1 Annual Mean Temperature
BIO2 Mean Diurnal Range (Mean of monthly (max temp – min temp))
BIO3 Isothermality (BIO2/BIO7) (* 100)
BIO4 Temperature Seasonality (standard deviation *100)
BIO5 Max Temperature of Warmest Month
BIO6 Min Temperature of Coldest Month
BIO7 Temperature Annual Range (BIO5-BIO6)
BIO8 Mean Temperature of Wettest Quarter
BIO9 Mean Temperature of Driest Quarter
BIO10 Mean Temperature of Warmest Quarter
BIO11 Mean Temperature of Coldest Quarter
BIO12 Annual Precipitation
BIO13 Precipitation of Wettest Month
BIO14 Precipitation of Driest Month
BIO15 Precipitation Seasonality (Coefficient of Variation)
BIO16 Precipitation of Wettest Quarter
BIO17 Precipitation of Driest Quarter
BIO18 Precipitation of Warmest Quarter
BIO19 Precipitation of Coldest Quarter

We can download future and past climate data from WorldClim as well, but we need to specify the Global Circulation Model, the Representative Pathway Scenario and the time period (2050 or 2070) we are interested in. Depending on our area of interest, we should choose carefully which GCMs are better suited for our intended analysis (McSweeney et al., 2015).

Please bear in mind that the following commands may take a while, since we will be downloading ca. 124 MB.

## ============================================================================##
## First, let's download current climate data for the whole planet
## ============================================================================##
current_climate <- getData("worldclim", var = "bio", res = 2.5)
## ============================================================================##


## ============================================================================##
## Then, let's download future climate data
## ============================================================================##
future_climate <- getData("CMIP5", var = "bio", res = 2.5, rcp = 85, model = "CC", 
    year = 70)

## ============================================================================##

Let’s now change the name of the variables in each object we created.

names(current_climate) <- c("bio_1", "bio_2", "bio_3", "bio_4", "bio_5", "bio_6", 
    "bio_7", "bio_8", "bio_9", "bio_10", "bio_11", "bio_12", "bio_13", "bio_14", 
    "bio_15", "bio_16", "bio_17", "bio_18", "bio_19")

names(future_climate) <- c("bio_1", "bio_2", "bio_3", "bio_4", "bio_5", "bio_6", 
    "bio_7", "bio_8", "bio_9", "bio_10", "bio_11", "bio_12", "bio_13", "bio_14", 
    "bio_15", "bio_16", "bio_17", "bio_18", "bio_19")

We need to correct the scale of bio_1 (i.e., Mean annual temperature). You can see here the reason why.

gain(current_climate$bio_1) = 0.1

gain(future_climate$bio_1) = 0.1

Next we will download elevation data for Greece.

altitude <- getData("alt", country = "GRC", mask = TRUE)

4.1 Crop raster data

We can crop the raster data we have just downloaded to the extent of Greece and then Crete. After we crop climate data to the extent of Greece, we can merge them with the altitudinal data. We will be able to do so, because then and only then will both objects (climate and altitude) have the same extent.

## ============================================================================##
## First for Greece
## ============================================================================##
current_climate_crop <- crop(current_climate, Greece_sf, snap = "in") %>% mask(., 
    Greece_sf)

altitude <- crop(altitude, Greece_sf, snap = "in") %>% mask(., Greece_sf) %>% 
## We are forcing the extent of altitude to be exactly the same as that of the
## climate data
resample(., current_climate_crop, "bilinear")
## ============================================================================##


## ============================================================================##
## Merge (stack) climate and altitude
## ============================================================================##
current_climate_crop <- stack(current_climate_crop, altitude)
## ============================================================================##


## ============================================================================##
## Then for Crete
## ============================================================================##
current_climate_Crete <- crop(current_climate_crop, Crete_sf, snap = "in") %>% mask(., 
    Crete_sf)
## ============================================================================##

You can do the same for the future climate data if you want.

Let’s now plot our results.

plot(current_climate_crop[[1]])
title("Mean annual temperature in Greece")

4.2 Calculate some statistics

The package raster contains functions for extracting descriptive statistics for entire rasters. Printing a raster object to the console by typing its name returns minimum and maximum values of a raster. The function summary() provides common descriptive statistics (minimum, maximum, quartiles and number of NAs). Further summary operations such as the standard deviation or custom summary statistics can be calculated with the function cellStats().

Raster value statistics can be visualized in a variety of ways. Specific functions such as boxplot(), density(), hist() and pairs() work also with raster objects

## ============================================================================##
## First, some main summary statistics
## ============================================================================##
cellStats(current_climate_crop, "summary")
##               bio_1       bio_2       bio_3     bio_4      bio_5      bio_6
## Min.        2.40000    55.00000    27.00000  4730.000   180.0000   -95.0000
## 1st Qu.    12.20000    92.00000    33.00000  5988.750   280.0000   -11.0000
## Median     14.50000   101.00000    34.00000  6516.000   298.0000    12.0000
##              bio_7       bio_8      bio_9     bio_10      bio_11     bio_12
## Min.      182.0000   -37.00000    94.0000   106.0000   -53.00000   385.0000
## 1st Qu.   262.0000    60.00000   205.0000   207.0000    37.00000   542.0000
## Median    288.0000    81.00000   229.0000   230.0000    60.00000   653.0000
##             bio_13      bio_14      bio_15     bio_16      bio_17      bio_18
## Min.       52.0000     0.00000    20.00000   138.0000     2.00000     2.00000
## 1st Qu.    74.0000     8.00000    32.00000   206.0000    34.00000    39.00000
## Median    103.0000    15.00000    50.00000   269.5000    65.00000    69.00000
##             bio_19  GRC_msk_alt
## Min.      110.0000    -2.484375
## 1st Qu.   184.0000   111.520996
## Median    250.0000   289.577637
##  [ reached getOption("max.print") -- omitted 4 rows ]
## ============================================================================##
## Then some density plots for some variables
## ============================================================================##
density(current_climate_crop[[1:2]])

## ============================================================================##
## And some boxplots for the same variables
## ============================================================================##
boxplot(current_climate_crop[[1:2]])

## ============================================================================##
## We can also calculate the correlation of those variables
## ============================================================================##
pairs(current_climate_crop[[1:2]])

You can do the same for other variables as well and you can use the function plot3D() to visualize the altitudinal data.

4.3 Correlation among raster data

As you may know, if we want to analyse some ecological data, we should not include in our analysis collinear independent variables (IVs). The correlation coefficient of our IVs should not exceed \(> 0.7\). We will now check the collinearity of our IVs and see which of them do not present any collinearity issues.

vifcor(current_climate_crop %>% as.data.frame(), th = 0.7)
## 15 variables from the 20 input variables have collinearity problem: 
##  
## bio_10 bio_16 bio_6 bio_17 bio_13 bio_1 bio_18 bio_7 bio_11 bio_9 bio_4 bio_19 bio_15 bio_5 bio_8 
## 
## After excluding the collinear variables, the linear correlation coefficients ranges between: 
## min correlation ( bio_12 ~ bio_2 ):  0.08518863 
## max correlation ( bio_3 ~ bio_2 ):  0.6767122 
## 
## ---------- VIFs of the remained variables -------- 
##     Variables      VIF
## 1       bio_2 7.050416
## 2       bio_3 5.365028
## 3      bio_12 2.204148
## 4      bio_14 4.909555
## 5 GRC_msk_alt 2.546695

4.4 Raster extraction

Raster extraction is the process of identifying and returning the values associated with a ‘target’ raster at specific locations, based on a (typically vector) geographic ‘selector’ object. The results depend on the type of selector used (points, lines or polygons) and arguments passed to the raster::extract() function.

The simplest example is extracting the value of a raster cell at specific points. For this purpose, we will use the point data we subsetted earlier, occs_Sfakia and occs_Rethymno. We will then compare if the species that are both present in these two spatial subsets have significantly different environmental preferences.

## ============================================================================##
## First we extract the data
## ============================================================================##
Sfakia_data <- raster::extract(current_climate_Crete, occs_Sfakia, method = "bilinear")
Rethymno_data <- raster::extract(current_climate_Crete, occs_Rethymno, method = "bilinear")
## ============================================================================##


## ============================================================================##
## Then we convert the sf objects to sp objects
## ============================================================================##
occs_Sfakia_sp <- occs_Sfakia %>% as_Spatial()
occs_Rethymno_sp <- occs_Rethymno %>% as_Spatial()
## ============================================================================##


## ============================================================================##
## Then we bind the extracted data with the respective spatial objects and convert
## them to spatial objects as well
## ============================================================================##
Sfakia_data <- data.frame(cbind(coordinates(occs_Sfakia_sp), Sfakia_data, occs_Sfakia_sp@data)) %>% 
    st_as_sf(., coords = c("coords.x1", "coords.x2"), crs = 4326)

Rethymno_data <- data.frame(cbind(coordinates(occs_Rethymno_sp), Rethymno_data, occs_Rethymno_sp@data)) %>% 
    st_as_sf(., coords = c("coords.x1", "coords.x2"), crs = 4326)
## ============================================================================##


## ============================================================================##
## We create a dataframe containing the mean values for each taxon for the annual
## temperature and precipitation
## ============================================================================##
Sfakia_data_mean <- Sfakia_data %>% group_by(scientificName) %>% summarise(bio1_mean = mean(bio_1), 
    bio12_mean = mean(bio_12)) %>% st_drop_geometry()

Rethymno_data_mean <- Rethymno_data %>% group_by(scientificName) %>% summarise(bio1_mean = mean(bio_1), 
    bio12_mean = mean(bio_12)) %>% st_drop_geometry()
## ============================================================================##


## ============================================================================##
## We make sure that we keep ONLY those species that are both present in Sfakia
## and Rethymno
## ============================================================================##
Sfakia_subset <- setDT(Sfakia_data_mean)[scientificName %chin% Rethymno_data_mean$scientificName]
Rethymno_subset <- setDT(Rethymno_data_mean)[scientificName %chin% Sfakia_data_mean$scientificName]
Sfakia_subset$scientificName <- factor(Sfakia_subset$scientificName)
Rethymno_subset$scientificName <- factor(Rethymno_subset$scientificName)
## ============================================================================##


## ============================================================================##
## Check if they are identical ----
## ============================================================================##
identical(sort(unique(Sfakia_subset$scientificName)), sort(unique(Rethymno_subset$scientificName)))
## ============================================================================##


## ============================================================================##
## Convert them to tibbles ----
## ============================================================================##
Sfakia_subset_tbl <- Sfakia_subset %>% arrange(scientificName) %>% as_tibble() %>% 
    mutate(County = "Sfakia")

Rethymno_subset_tbl <- Rethymno_subset %>% arrange(scientificName) %>% as_tibble() %>% 
    mutate(County = "Rethymno")

Sfakia_Rethymno <- full_join(Sfakia_subset_tbl, Rethymno_subset_tbl)
Sfakia_Rethymno$County <- factor(Sfakia_Rethymno$County)
## ============================================================================##

## ============================================================================##
## Finally, we conduct the ANOVA
## ============================================================================##
summary(aov(bio1_mean ~ County, data = Sfakia_Rethymno))
summary(aov(bio12_mean ~ County, data = Sfakia_Rethymno))
## ============================================================================##

Did we find any significant differences?

5 Homework

Please submit your answer at this email in PDF format.

  1. Download PA (Protected Areas) and biotic (i.e., for any type of organism: animal or plant) data for any country you want.
  2. Which county (not country) has the most species/occurrences? Present that visually.
  3. Calculate the percentage of land inside this country’s protected area system that are managed under different categories, as defined by IUCN.
  4. How many species are present in that country?
  5. Download raster data for the country you chose.
  6. Which abiotic variables are correlated (threshold: \(> 0.75\)) in that country?
  7. Subset the species occurring in any two counties you want. Do they have the same or different ecological preferences?

6 Appendix: R code

##===========================================================================##
## Do NOT run this code
##===========================================================================##
## Global options
options(Encoding="UTF-8")

library(knitr)
library(rmdformats)

## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
                 cache=TRUE,
               prompt=FALSE,
               tidy=TRUE,
               # comment=NA,
               message=FALSE,
               warning=FALSE,
               eval = TRUE)
opts_knit$set(width=75)
##===========================================================================##
##===========================================================================##
## Install the main packages
##===========================================================================##
install.packages(c('tidyverse', 'rgbif', 'raster', 'data.table',  
                   'rgeos', 'rgdal', 'wdpar', 
                   'countrycode', 'devtools', 'sf', 'usdm',
                   'speciesgeocodeR', 'pacman', 'dismo',
                   'CoordinateCleaner', 'ggspatial', 'scales',
                   'scrubr', 'mapr', 'rasterVis'), dependencies = T)

devtools::install_github("rinat", "ropensci")
##===========================================================================##
##===========================================================================##
## Load them
##===========================================================================##
pacman::p_load(tidyverse, rgbif, raster, usdm,  
                   rgeos, rgdal, rinat, wdpar, 
                   countrycode, devtools, sf,
                   speciesgeocodeR, pacman, dismo,
                   CoordinateCleaner, ggspatial, scales,
                   scrubr, mapr, rasterVis, data.table)
##===========================================================================##
##===========================================================================##
## Download the data for Greece
##===========================================================================##
## There are three levels available, you are free to explore them
Greece <- getData('GADM', country = 'GRC', level = 3)
##===========================================================================##
##===========================================================================##
## Plot Greece
##===========================================================================##
plot(Greece)
##===========================================================================##
## First we need to see what is inside the object we created
##===========================================================================##
Greece
names(Greece)
Crete <- subset(Greece, NAME_1 == "Crete")
##===========================================================================##
## Plot Crete
##===========================================================================##
plot(Crete)

##===========================================================================##
## Download protected area data for Greece (it might take a while)
##===========================================================================##
gr_raw_pa_data <- wdpa_fetch("Greece", wait = TRUE)
##===========================================================================##

##===========================================================================##
## Let's see what we have downloaded
##===========================================================================##
# head(gr_raw_pa_data)
##===========================================================================##
## Reproject data to longitude/latitude for plotting
##===========================================================================##
gr_pa_data <- st_transform(gr_raw_pa_data, 4326)
pre {
  max-height: 400px;
  overflow-y: auto;
}

pre[class] {
  max-height: 400px;
}
##===========================================================================##
## Recalculate extent
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
  mutate(AREA_KM2 = as.numeric(st_area(.)) * 1e-6) ## What is this number?
##===========================================================================##
##===========================================================================##
## Calculate the percentage of land inside the PA system for each IUCN category
##===========================================================================##
statistic <- gr_pa_data %>%
  as.data.frame() %>%
  dplyr::select(-geometry) %>%
  group_by(IUCN_CAT) %>%
  summarize(area_km = sum(AREA_KM2)) %>%
  ungroup() %>%
  mutate(percentage = (area_km / sum(area_km)) * 100) %>%
  arrange(desc(area_km))
##===========================================================================##


##===========================================================================##
## Print it
##===========================================================================##
statistic
##===========================================================================##
## Keep certain PAs only
##===========================================================================##
gr_pa_data_filtered <- gr_pa_data %>% filter(!str_detect(IUCN_CAT, 'Not'))
##===========================================================================##
##===========================================================================##
## First convert Greece and Crete to sf objects
##===========================================================================##
Greece_sf <- st_as_sf(Greece)
Crete_sf <- st_as_sf(Crete)
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Greece_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  geom_sf(data = Greece_sf, fill = NA) +
  geom_sf(aes(fill = IUCN_CAT), data = gr_pa_data_filtered, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")
##===========================================================================##
## Restrict the analysis to Crete
##===========================================================================##
pa_crete <- st_intersection(gr_pa_data_filtered, Crete_sf)
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  geom_sf(data = Crete_sf, fill = NA) +
  geom_sf(aes(fill = IUCN_CAT), data = pa_crete, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")
##===========================================================================##
## Restrict the analysis to Crete
##===========================================================================##
pa_crete_all <- st_intersection(gr_pa_data, Crete_sf)
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  geom_sf(data = Crete_sf, fill = NA) +
  geom_sf(aes(fill = IUCN_CAT), data = pa_crete_all, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")

##===========================================================================##
## Download data and create an additional column
##===========================================================================##
petromarula_inat <- get_inat_obs(taxon_name = "Petromarula pinnata") %>% 
  mutate(countryCode = 'GR')
##===========================================================================##


##===========================================================================##
## Remove any rows without coordinates
##===========================================================================##
petromarula_inat <- petromarula_inat[complete.cases(petromarula_inat[ , 5:6]),]

## Why do we use the fifth and sixth column to remove any empty rows?
##===========================================================================##
##===========================================================================##
## Clean the coordinates
##===========================================================================##
flags_rinat <- clean_coordinates(x = petromarula_inat, 
                           lon = "longitude",
                           lat = "latitude", 
                           countries = "countryCode", 
                           species = "scientific_name",
                           tests = c("capitals", 
                                     "centroids",
                                     "equal", 
                                     "gbif", 
                                     "zeros",
                                     "seas"), 
                           seas_ref = Crete)  
##===========================================================================##


##===========================================================================##
## Keep only those that are OK
##===========================================================================##
flags_rinats <- flags_rinat %>% 
  mutate(Data = ifelse(.summary == 'TRUE', 
                       'Clean',
                       'Flagged'),
         NAME_1 = 'Crete',
         Dataset = 'iNaturalist') %>% 
  dplyr::filter(str_detect(Data, 'Cle')) %>% 
  dplyr::select(scientific_name,
                longitude,
                latitude,
                Dataset)
##===========================================================================##
spp_petromarula <- name_suggest(q = "Petromarula pinnata", rank = "species", limit = 10000)$data$key[1]
##===========================================================================##
## Download data
##===========================================================================##

sp_occ <- occ_search(taxonKey = spp_petromarula, 
                     country ='GR',
                     hasCoordinate = T,
                     limit = 1000, 
                     return = 'data')$data %>% 
  as_tibble() ## Convert to tibble for pretty formatting
##===========================================================================##


##===========================================================================##
## To prevent any problem with the pathway it is a good practice to remove blank
## spaces from species names
##===========================================================================##
sp_occ$name <- sub(" ", "_", sp_occ$name)
##===========================================================================##


##===========================================================================##
## Total number of occurrences
##===========================================================================##
sort(table(sp_occ$name), decreasing = T)
##===========================================================================##
## Remove iNaturalist occurrences
##===========================================================================##
sp_occ_rem <- sp_occ %>% filter(str_detect(references, 'inatu'))
##===========================================================================##


##===========================================================================##
## How many occurrences in GBIF are also in iNaturalist?
##===========================================================================##
sp_occ %>% filter(str_detect(references, 'inatu')) %>% NROW()
##===========================================================================##
## Clean the coordinates
##===========================================================================##
flags_gbif <- clean_coordinates(x = sp_occ_rem, 
                           lon = "decimalLongitude",
                           lat = "decimalLatitude", 
                           countries = "countryCode", 
                           species = "species",
                           tests = c("capitals", 
                                     "centroids",
                                     "equal", 
                                     "gbif", 
                                     "zeros",
                                     "seas"), 
                           seas_ref = Crete)
##===========================================================================##


##===========================================================================##
## Keep only those that are OK
##===========================================================================##
flags_gbif <- flags_gbif %>% 
  mutate(Data = ifelse(.summary == 'TRUE', 
                       'Clean',
                       'Flagged'),
         NAME_1 = 'Crete',
         Dataset = 'GBIF',
         longitude = decimalLongitude,
         latitude = decimalLatitude,
         scientific_name  = 'Petromarula pinnata') %>% 
  dplyr::filter(str_detect(Data, 'Cle')) %>% 
  dplyr::select(scientific_name,
                longitude,
                latitude,
                Dataset) 
##===========================================================================##
flags_both <- full_join(flags_gbif, flags_rinats) %>% 
  st_as_sf(., coords = c('longitude', 'latitude'), 
           crs = 4326)
##===========================================================================##
## First convert the Dataset column to a factor
##===========================================================================##
flags_both$Dataset <- factor(flags_both$Dataset)
##===========================================================================##


##===========================================================================##
## Then add the centroids of each Cretan county
##===========================================================================##
Crete_sf <- cbind(Crete_sf, st_coordinates(st_centroid(Crete_sf)))
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  # geom_label(data = Crete_sf, 
  #            aes(X, Y, label = NAME_3), 
  #            size = 2, 
  #            fontface = "bold") +
  geom_sf(data = Crete_sf, fill = NA) +
  geom_sf(data = flags_both,
          size = 2,
          aes(color = Dataset, fill = Dataset, shape = Dataset)) +
  annotation_scale(location = "bl", 
                   width_hint = 0.4) +
  annotation_north_arrow(location = "bl", 
                         which_north = "true", 
                         pad_x = unit(0.75, "in"), 
                         pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering) +
  coord_sf(xlim = c(23.4, 26.4), 
           ylim = c(34.7, 35.7), expand = FALSE) + 
  xlab("Longitude") + 
  ylab("Latitude") +
  ggtitle("Occurrences available online", 
          subtitle = expression(italic("Petromarula pinnata"))) +
  theme(panel.grid.major = element_line(color = gray(0.5), 
                                        linetype = "dashed", 
                                        size = 0.5), 
        panel.background = element_rect(fill = "aliceblue"))
## ============================================================================##
## First, create a vector of the species you are interested in
## ============================================================================##
splist <- c("Centaurea idaea", "Petromarula pinnata", "Dianthus juniperinus")
## ============================================================================##


## ============================================================================##
## Download data for them
## ============================================================================##
keys <- sapply(splist, function(x) name_suggest(x)$data$key[1], USE.NAMES = FALSE)
## ============================================================================##


## ============================================================================##
## Create an WKT file for the download
## ============================================================================##
ex <- raster::extent(Crete)
ex <- as(ex, "SpatialPolygons")
ex <- rgeos::writeWKT(ex)

## It seems that there is a problem with the number of digits, so we trimmed them
ex2 <- "POLYGON((23.47347 34.80068, 23.47347 35.69597, 26.35347 35.6959, 26.35347 34.80068, 23.47347 34.80068))"
## ============================================================================##


## ============================================================================##
## Download the data for these species from GBIF ----
## ============================================================================##
dat_ne <- occ_search(taxonKey = keys, hasCoordinate = T, limit = 1000, geometry = ex2)
## ============================================================================##


## ============================================================================##
## Convert it to a dataframe ----
## ============================================================================##
for (i in seq_along(dat_ne)) {
    dat_ne[[i]] <- dat_ne[[i]]$data
}

dat_ne <- lapply(dat_ne, "as.data.frame")

dat_ne <- bind_rows(dat_ne)
##===========================================================================##
## Sanity checks
##===========================================================================##
gr_boundary_data <- Greece %>% st_as_sf() %>% 
  st_set_precision(1000) %>%
  lwgeom::st_make_valid() %>%
  st_set_precision(1000) %>%
  st_combine() %>%
  st_union() %>%
  st_set_precision(1000) %>%
  lwgeom::st_make_valid() %>%
  st_transform(st_crs(gr_pa_data)) %>%
  lwgeom::st_make_valid()
##===========================================================================##


##===========================================================================##
## clip Greece's protected areas to the coastline
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
  filter(MARINE == "terrestrial") %>%
  st_intersection(gr_boundary_data) %>%
  rbind(gr_pa_data %>%
          filter(MARINE == "marine") %>%
          st_difference(gr_boundary_data)) %>%
  rbind(gr_pa_data %>% filter(!MARINE %in% c("terrestrial",
                                              "marine"))) 
##===========================================================================##
##============================================================================##
## Use the name_suggest function to get the gbif taxon key
##============================================================================##
tax_key <- name_suggest(q = "Magnoliopsida", rank = "Class")
##============================================================================##


##============================================================================##
## Sometimes groups have multiple taxon keys, so we will
## check how many records are available for them
##============================================================================##
lapply(tax_key$data$key, "occ_count")


##============================================================================##
## The first one is relevant
##============================================================================##
tax_key <- tax_key$data$key
##============================================================================##


##============================================================================##
## How many plant taxa occur in Greece and are present in GBIF?
##============================================================================##
occ_count(tax_key, country = "GRC")


##============================================================================##
## Let's download some plant species for Crete
##============================================================================##
data_gr <- occ_search(taxonKey = tax_key, 
                     return = "data", 
                     hasCoordinate = T, ## We only want data with coordinates
                     geometry = ex2, ## Here we define our study area polygon
                     # country = 'GR', ## If you want data for a specific country
                     limit = 3000)$data %>% ## We only want 3000 occurrences
  as_tibble() ## Convert to tibble for pretty formatting
##============================================================================##
## First convert the occurrence data from GBIF to a spatial object
##============================================================================##
data_gr_sf <- st_as_sf(data_gr, 
                    coords = c('decimalLongitude', 
                               'decimalLatitude'), 
                               crs = 4326)
##============================================================================##


##============================================================================##
## Then create an object representing Sfakia and another one representing
## Rethymno
##============================================================================##
Sfakia <- subset(Crete, NAME_3 == 'Sfakia') %>% st_as_sf()
Rethymno <- subset(Crete, NAME_3 == 'Rethymno') %>% st_as_sf()
##============================================================================##

##============================================================================##
## Then use it for spatial subsetting to return all the occurrences in Sfakia and
## in Rethymno
##============================================================================##
occs_Sfakia <- data_gr_sf[Sfakia, ]
occs_Rethymno <- data_gr_sf[Rethymno, ]
##============================================================================##


##============================================================================##
## Use it again to obtain the occurrences NOT in Sfakia
##============================================================================##
occs_not_in_Sfakia <- data_gr_sf[Sfakia, , op = st_disjoint]
##============================================================================##
##============================================================================##
## First we plot Crete
##============================================================================##
plot(Crete)
##============================================================================##


##============================================================================##
## Then we plot Sfakia, after we convert the polygon into a polyline with st_cast
##============================================================================##
plot(st_geometry(st_cast(Sfakia, 'MULTILINESTRING')), add = T, col = 'red')
##============================================================================##


##============================================================================##
## Then we plot the occurrences present in Sfakia
##============================================================================##
plot(st_geometry(occs_Sfakia), add = T, col = 'blue')
##============================================================================##


##============================================================================##
## And finally we plot the occurrences not present in Sfakia
##============================================================================##
plot(st_geometry(occs_not_in_Sfakia), add = T, col = 'darkgreen')
##============================================================================##
## First we will create a new spatial object containing both the point and the
## polygon data
##============================================================================##
aggr_data = Crete_sf %>%
  st_join(data_gr_sf) %>%
  group_by(NAME_3) %>% ## Here we group the data based on the county names
  
  ## First we create a variable containing the sum of the species in each county
  summarize(Species_Number = sum(NROW(unique(scientificName))),
            
            ## Then we just sum the number of occurrences in each county
            Occurrence_Number = n())
##============================================================================##


##============================================================================##
## Let's first plot the occurence number in each Cretan county
##============================================================================##
plot(aggr_data['Occurrence_Number'])


##============================================================================##
## And then plot the species number in each Cretan county
##============================================================================##
plot(aggr_data['Species_Number'])
##============================================================================##
## First, let's download current climate data for the whole planet
##============================================================================##
current_climate <- getData('worldclim', var = 'bio', res = 2.5)
##============================================================================##


##============================================================================##
## Then, let's download future climate data
##============================================================================##
future_climate <- getData('CMIP5', var = 'bio', res = 2.5, 
                          rcp = 85, model = 'CC', year = 70)

##============================================================================##
names(current_climate) <- c('bio_1', 'bio_2', 'bio_3', 'bio_4', 'bio_5', 
                               'bio_6', 'bio_7', 'bio_8', 'bio_9', 'bio_10', 
                               'bio_11', 'bio_12', 'bio_13', 'bio_14', 'bio_15', 
                               'bio_16', 'bio_17', 'bio_18', 'bio_19')

names(future_climate) <- c('bio_1', 'bio_2', 'bio_3', 'bio_4', 'bio_5', 
                               'bio_6', 'bio_7', 'bio_8', 'bio_9', 'bio_10', 
                               'bio_11', 'bio_12', 'bio_13', 'bio_14', 'bio_15', 
                               'bio_16', 'bio_17', 'bio_18', 'bio_19')
gain(current_climate$bio_1) = 0.1

gain(future_climate$bio_1) = 0.1
altitude <- getData('alt', country = 'GRC', mask = TRUE) 
##============================================================================##
## First for Greece
##============================================================================##
current_climate_crop <- crop(current_climate, 
                            Greece_sf, 
                            snap = 'in') %>% 
  mask(., Greece_sf)

altitude <- crop(altitude, 
                 Greece_sf, 
                 snap = 'in') %>% 
  mask(., Greece_sf) %>% 
  
  ## We are forcing the extent of altitude to be exactly the same as that of the
  ## climate data
  resample(., current_climate_crop, 'bilinear')
##============================================================================##


##============================================================================##
## Merge (stack) climate and altitude
##============================================================================##
current_climate_crop <- stack(current_climate_crop, altitude)
##============================================================================##


##============================================================================##
## Then for Crete
##============================================================================##
current_climate_Crete <- crop(current_climate_crop,
                              Crete_sf,
                              snap = 'in') %>% 
  mask(., Crete_sf)
##============================================================================##
plot(current_climate_crop[[1]])
title('Mean annual temperature in Greece')
##============================================================================##
## First, some main summary statistics
##============================================================================##
cellStats(current_climate_crop, 'summary')


##============================================================================##
## Then some density plots for some variables
##============================================================================##
density(current_climate_crop[[1:2]])


##============================================================================##
## And some boxplots for the same variables
##============================================================================##
boxplot(current_climate_crop[[1:2]])

##============================================================================##
## We can also calculate the correlation of those variables
##============================================================================##
pairs(current_climate_crop[[1:2]])
vifcor(current_climate_crop %>% as.data.frame(), th = 0.7)
##============================================================================##
## First we extract the data
##============================================================================##
Sfakia_data <- raster::extract(current_climate_Crete, occs_Sfakia, method = "bilinear")
Rethymno_data <- raster::extract(current_climate_Crete, occs_Rethymno, method = "bilinear")
##============================================================================##


##============================================================================##
## Then we convert the sf objects to sp objects
##============================================================================##
occs_Sfakia_sp <- occs_Sfakia %>% as_Spatial()
occs_Rethymno_sp <- occs_Rethymno %>% as_Spatial()
##============================================================================##


##============================================================================##
## Then we bind the extracted data with the respective spatial objects and
## convert them to spatial objects as well
##============================================================================##
Sfakia_data <- data.frame(cbind(coordinates(occs_Sfakia_sp), 
                                Sfakia_data, 
                                occs_Sfakia_sp@data)) %>% 
  st_as_sf(., coords = c('coords.x1', 'coords.x2'), 
           crs = 4326)

Rethymno_data <- data.frame(cbind(coordinates(occs_Rethymno_sp), 
                                  Rethymno_data, 
                                  occs_Rethymno_sp@data)) %>% 
  st_as_sf(., coords = c('coords.x1', 'coords.x2'),
           crs = 4326)
##============================================================================##


##============================================================================##
## We create a dataframe containing the mean values for each taxon for the annual
## temperature and precipitation
##============================================================================##
Sfakia_data_mean <- Sfakia_data %>% 
  group_by(scientificName) %>% 
  summarise(bio1_mean = mean(bio_1),
            bio12_mean = mean(bio_12)) %>% 
  st_drop_geometry()

Rethymno_data_mean <- Rethymno_data %>% 
  group_by(scientificName) %>% 
  summarise(bio1_mean = mean(bio_1),
            bio12_mean = mean(bio_12)) %>% 
  st_drop_geometry()
##============================================================================##


##============================================================================##
## We make sure that we keep ONLY those species that are both present in Sfakia
## and Rethymno
##============================================================================##
Sfakia_subset <- setDT(Sfakia_data_mean)[scientificName %chin% Rethymno_data_mean$scientificName]
Rethymno_subset <- setDT(Rethymno_data_mean)[scientificName %chin% Sfakia_data_mean$scientificName]
Sfakia_subset$scientificName <- factor(Sfakia_subset$scientificName)
Rethymno_subset$scientificName <- factor(Rethymno_subset$scientificName)
##============================================================================##


##============================================================================##
## Check if they are identical ----
##============================================================================##
identical(sort(unique(Sfakia_subset$scientificName)), 
          sort(unique(Rethymno_subset$scientificName)))
##============================================================================##


##============================================================================##
## Convert them to tibbles ----
##============================================================================##
Sfakia_subset_tbl <- Sfakia_subset %>% 
  arrange(scientificName) %>%
  as_tibble() %>% 
  mutate(County = 'Sfakia')

Rethymno_subset_tbl <- Rethymno_subset %>%
  arrange(scientificName) %>% 
  as_tibble() %>% 
  mutate(County = 'Rethymno')

Sfakia_Rethymno <- full_join(Sfakia_subset_tbl, Rethymno_subset_tbl)
Sfakia_Rethymno$County <- factor(Sfakia_Rethymno$County)
##============================================================================##

##============================================================================##
## Finally, we conduct the ANOVA
##============================================================================##
summary(aov(bio1_mean ~ County, data = Sfakia_Rethymno))
summary(aov(bio12_mean ~ County, data = Sfakia_Rethymno))
##============================================================================##
##===========================================================================##
## Do NOT run this code
##===========================================================================##
labs = knitr::all_labels(echo = TRUE)
##===========================================================================##

References

Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V., & Pebesma, E.J. (2008) Applied spatial data analysis with r. Springer,
Lovelace, R., Nowosad, J., & Muenchow, J. (2019) Geocomputation with r. CRC Press,
Maldonado, C., Molina, C.I., Zizka, A., Persson, C., Taylor, C.M., Albán, J., Chilquillo, E., Rønsted, N., & Antonelli, A. (2015) Estimating species diversity and distribution in the era of b ig d ata: To what extent can we trust public databases? Global Ecology and Biogeography, 24, 973–984.
McSweeney, C.F., Jones, R.G., Lee, R.W., & Rowell, D.P. (2015) Selecting CMIP5 GCMs for downscaling over multiple regions. Clim. Dyn., 44, 3237–3260.

  1. There are two ways to do that fast:
    1. library(pacman); p_load('package_name_1', 'package_name_2')
    2. install.packages(c('package_name_1', 'package_name_2'), dependecies = TRUE)
    Remember that if you use the first way, you need to install package pacman first.↩︎

  2. If you want to deepen your understanding on this, please click here↩︎

  3. You are strongly advised to have a look to both these books↩︎

  4. With no coordinates to be exact↩︎

 

Delivered to you by Dr. Kostas Kougioumoutzis

kkougiou@aua.gr